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I 

Abstract 

This  paper  surveys  perturbation  theory  for  the  pseudo- inverse 
(Moore- Penrose  generalized  inverse),  for  the  orthogonal  projec- 
tion onto  the  column  space  of  a matrix,  and  for  the  linear 
least  squares  problem. 
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from  (lc)  that  the  matrix 

pa-m+ 

is  Hermitian  and  from  (lb)  that  PA  is  idempotent  and  R(PA)  - R(A). 

Hence  PA  is  the  orthogonal  projection  onto  R(A).  A similar  argument 
shows  that 

(1.2)  Ra  - A+A 

U 

is  the  projection  onto  R(A  ),  the  row  space  of  A. 

The  second  construction  is  the  solution  of  the  linear  least  squares 
problem  of  choosing  a vector  x to  minimize 

(1.3)  p(x)  - ||b-Ax||  2 , 

where  b is  a fixed  vector  and  ||  • II 2 denotes  the  usual  Euclidean  norm. 
The  solutions  of  this  problem  are  given  by 

(1.4)  x ■ A+b  ♦ (I-Ra)z  , 

where  z is  arbitrary.  When  A has  full  column  rank,  RA  - I and  the 
solution  x - A+b  is  unique.  Otherwise,  it  is  easily  verified  from  (1.1) 
and  (1.2)  that  Afb  is  orthogonal  to  (I-RA)z,  so  that  by  the  Pythagorean 
theorem 

llx|||  - ||A+b||  2 ♦ II  (I-Ra)z|||  . 

It  follows  that  x ■ A+b  is  the  unique  solution  of  (1.3)  that  has  minimal 


norm. 
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The  object  of  this  paper  is  to  describe  the  effects  of  perturbations 

4.  4. 

in  A on  A , on  PA  and  on  A b;  i.e.,  on  the  pseudo -inverse,  on  the 
projection  onto  R(A),  and  on  the  solution  of  the  linear  least  squares 
problem.  Such  descriptions  are  important  for  three  reasons.  First  the 
results  are  useful  mathematical  tools.  Second,  in  numerical  applications 
the  elements  of  A will  seldom  be  known  exactly,  and  it  is  necessary  to 
have  bounds  on  the  effects  of  the  uncertainties  in  A.  Finally  many 
nunerical  processes  for  computing  projections  and  least  squares  solutions 
behave  as  if  exact  computations  had  been  performed  on  a perturbed  matrix 
A + E,  where  E is  a small  matrix  whose  size  depends  on  the  algorithm 
and  the  arithmetic  used  in  its  execution. 

We  shall  be  concerned  with  three  kinds  of  results:  perturbation 

bounds,  asymptotic  expressions,  and  derivatives.  The  perturbation  bounds 
are  needed  in  the  applications  mentioned  above.  Asymptotic  expressions 
and  derivatives  are  useful  computational  tools  when  the  perturbation  is 
actually  known.  Moreover  they  can  be  used  to  check  the  sharpness  of  the 
perturbation  bounds.  Not  surprisingly  it  is  rather  difficult  to  obtain 
a reasonably  sharp  perturbation  bound  that  tells  the  complete  story  of  the 
effects  of  the  perturbations.  Asymptotic  forms  and  derivatives  are  easier 
to  come  by. 

In  order  to  make  this  survey  reasonably  self-contained,  we  begin  in 
12  with  a review  of  the  necessary  background.  In  §3  we  develop  the  pertur- 
bation theory  for  the  pseudo- inverse,  in  S4  for  the  projection  PA,  and 
in  §5  for  the  least  squares  solution  A+b. 
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Notes  and  references.  For  background  on  the  generalized  inverse 
see  the  books  by  Ben-Israel  and  Greville  (1974),  Boullion  and  Odell  (1971), 
and  Rao  and  Mitra  (1971).  The  expression  (1.1)  is  due  to  Penrose  (1955, 
1956)  whose  papers  initiated  the  current  interest  in  the  pseudo -inverse. 

Many  articles  on  perturbation  theory  for  pseudo -inverses  and  related 
problems  have  appeared  in  the  literature.  To  date  the  most  complete 
survey  of  the  problem  has  been  given  by  Wedin  (1973).  In  addition  to 
collecting  and  unifying  earlier  material,  this  paper  will  present  some 
new  results. 

2.  Preliminaries 

Notation.  Throughout  this  paper  we  shall  use  the  notational  conven- 
tions of  Householder  (1964) . Specifically,  matrices  are  denoted  by  upper 
case  Latin  and  Greek  letters,  vectors  by  lower  case  Latin  letters,  and 
scalars  by  lower  case  Greek  letters.  The  symbol  C denotes  the  set  of 

complex  numbers,  t11  the  set  of  complex  n- vectors,  and  tmxn  the  set  of 

H 

complex  m x n matrices.  The  matrix  A is  the  conjugate  transpose  of  A. 
The  column  space  of  A is  denoted  by  R(A),  and  its  orthogonal  complement 
by  R(A)1. 

We  shall  be  concerned  with  a fixed  matrix  A ( Cmxn  with 

rank  (A)  * r . 

The  matrix  E « l^xn  will  denote  a perturbation  of  A and  we  shall  set 


B - A ♦ E . 
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Since  we  are  concerned  with  the  geometry  of  C11,  we  shall  be  at  some 
pains  to  cast  our  results  in  such  a way  that  they  are  not  affected  by  uni- 
tary transformations  (cf.  the  section  on  unitarily  invariant  norms  below). 
We  may  use  this  fact  to  transform  our  perturbation  problems  into  a simpler 
form.  Specifically,  let  U = (Uj.l^)  € be  a unitary  matrix  with 
R(Uj)  * R(A)  and  let  V ■ (Vpl^)  be  a unitary  matrix  with  R(Vj)  = R(AH). 
Then  l/W  has  the  form 

(2.1) 


where  A^  € Crxr  is  nonsingular.  We  shall  partition  l/W  and  l/W  con- 
formally with  l/W: 


i/W  - 1 

Eu 

E12 

l E21 

E22 

/ B 

i/W  - 

B12 

)■ 

\B21 

B22  , 

/ 

These  forms  will  be  called  reduced  forms  of  the  matrices  A,  B,  and  E,  and 
in  the  sequel  we  shall  often  assume  that  the  matrices  are  in  reduced  form. 
In  this  case,  the  pseudo- inverse  is  given  by 
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Singular  values.  It  is  a well-known  result  that  in  the  reduced  form 
(2.1)  the  matrices  and  may  be  chosen  so  that 

= Z ■ diagCa^^, . . . ,or)  t 

where 


a.  > o,  > . . . > o > 0 . 
l l r 

This  reduced  form  is  called  the  singular  value  decomposition  of  the  matrix 
A,  and  the  numbers  ck  are  called  the  singular  values  of  A.  From  the 
relation  (2.2)  and  the  fact  that  (U^AV)^  * V^A^U,  it  follows  that 


The  i-th  singular  value  of  a matrix  A,  which  will  be  denoted  by 
cr^(A),  can  be  written  in  the  form 

(2-3)  a.  (A)  - sup  inf  ||Ax|L,  (i  - l,2,...,n)  , 

1 dim(X)-i  x€X  c 
llx|l2»l 

where 

(2.4)  llyll 2 - 

is  the  usual  Euclidean  norm.  This  characterization  provides  a natural 
convention  for  nunbering  the  singular  values  of  a rectangular  matrix: 

A € Cmxn  has  n singular  values  of  which  n-r  are  zero;  AH  has  m 
singular  values  of  which  m-r  are  zero.  The  nonzero  singular  values  of 
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A and  are  the  same. 

Two  inequalities  that  we  shall  need  in  the  sequel  follow  fairly 
directly  from  (2.3).  They  are 

ctj(A)  - o1(E)  < ct^A)  < ct^A)  + ct1(E) 

and 

(2.5)  o.(AC)  <oi(A)a1(C),  o^a.fC)  . 

Unitarily  invariant  matrix  norms.  A norm  on  C1"*11  is  a function 
INI:  4™  -+F  that  satisfies  the  conditions 


1. 

A f 0 — * ||A||  > 0 , 

(2.6) 

2. 

lloAII  ■ |a|||A||  , 

3. 

l|A+B||  < ||A||  + ||B||  . 

A norm  ||*||  is  unitarily  invariant  if 

lll/Vll  - IIAII 

for  all  unitary  matrices  U and  V.  The  perturbation  bounds  in  this  paper 
will  be  cast  in  terms  of  unitarily  invariant  norms,  whose  properties  will  now 
be  described. 

Let  U and  V be  the  unitary  matrices  realizing  the  singular  value 
decomposition  of  the  matrix  A € Iinxn.  Then  for  any  unitarily  invariant  norm 


m,n 
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Thus 


HAH. 


m,n 


is  a function  of  the  singular  values  of  A,  say 


(2.7) 


HAH. 


m,n 


Vn(al*°2 ax? 


It  follows  from  (2.6)  that  <p  „ regarded  as  a function  on  Fn  is  a 

m,n 

norm.  Since  the  interchange  of  two  rows  or  two  columns  of  a matrix  is  a 
unitary  transformation  of  the  matrix,  the  function  <pm  n is  symmetric  in 
its  arguments  ct1,c-,...,ct  . It  can  also  be  shown  that  <p  is  nondecreas- 
ing  in  the  sense  that 


(2.9) 


0 S < o| 


(i“l,2 n) 


Vn(CTl’ 


*an}  5 *m,n(ai»' 


We  shall  say  that  the  norm  IHI  is  generated  by  <p 

m,n  a m,n 

An  important  norm  is  the  spectral  norm  ||*||2  generated  by  the  function 
<p  defined  by 

(CTJ  * * * * *®fl)  M ^3^  ( I I » • • • » I I ) • 

This  norm  can  also  be  defined  by  the  equation 

(2.10)  l|A||2  - sup  ||Ax||2  , 

L llx|l2-l  L 

where  IHI2  on  the  right  denotes  the  Euclidean  norm  defined  by  (2.4). 

The  spectral  norm  satisfies  an  important  consistency  relation  with  other 
unitarily  invariant  norms.  If  ||*||  is  a unitarily  invariant  norm  generated 
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by  <p,  then  it  follows  from  (2.5)  and  (2.9)  that 
(2.11)  IICDII  < ||C||2I|D||,  ||C||||D||2 

whenever  the  product  CD  is  defined.  In  particular  INI 2 is  consistent 
with  itself  over  matrices  and  vectors  of  all  dimensions. 

A second  example  of  a unitarily  invariant  norm  is  the  Frobenius  norm 
generated  by  the  function 

2 2 2 

^>F^al,CT2>  * * * ,CTn^  " (o^+ o 2+. . ,+ct^)  . 

For  any  matrix  A € Cra><n 

2 m n 2 11 

HAlIp  - l l |af.|  = trace  (AHA)  . 

i=l  j=l  1J 

The  Frobenius  norm  satisfies  the  consistency  relation 


IICDIIp  < IICIIpHDHp  . 

Since  we  shall  be  dealing  with  matrices  of  varying  dimensions,  we  shall 

00 

work  with  a family  of  unitarily  invariant  norms  defined  on  U dnxn.  It  is 

m,n=l 

important  that  the  individual  norms  so  defined  interact  with  one  another 
properly.  Accordingly,  we  make  the  following  definition. 

oo 

Definition  2.1.  Let  ||*||:  U Cmxn  ■*  R be  a family  of  unitarily 

m,n“l 

invariant  norms.  Then  ||*||  is  uniformly  generated  if  there  is  a symnetric 
function  4),  defined  for  all  infinite  sequences  with  only  a finite  number 
of  nonzero  terms,  such  that 
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llAll  * <p(a1(A),a2(A) CTn(A),0,0,...) 

for  all  A € c"1*11.  It  is  normalized  if 

llxll  = ||x||2 

for  any  vector  x. 

The  function  <p  in  the  above  definition  must  satisfy  the  conditions 
(2.6).  Any  norm  defined  by  such  a function  can  be  normalized.  Indeed  we  have 

llxll  * <p(<7j(x)  ,0,0, . . .)  = <P ( ||x|| 2 , 0 , 0 , . . .)  , 

from  which  it  follows  that  ||x||  = m-IIx||2  for  some  constant  p.  that  is  inde- 
pendent of  the  dimension  of  x.  The  function  p then  generates  the 
normalized  family  of  norms. 

A uniformly  generated  family  of  norms  has  some  nice  properties.  First, 
since  the  nonzero  singular  values  of  a matrix  and  its  conjugate  transpose  are 
the  same,  we  have 

IIAH||  - IIAII  . 

Second,  if  a matrix  is  bordered  by  zero  matrices,  its  norm  remains  unchanged; 
i.e. , 

(000 
0 A 0 

0 0 0 

In  particular  if  A is  in  reduced  form,  then 


i 
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HAH  * HA^I 

and 

HA+||  = HA^Jll  . 

A third  property  is  that  if  INI  is  normalized  then 

(2.13)  HAH 2 < ||A||  • 

In  fact  from  (2.11)  and  the  fact  that  ||x||  = |)x||2,  we  have 

(2.14)  ||Ax||2  = ||Ax||  < ||A||||x||2 

for  all  x.  But  by  (2.10)  ||A|| 2 is  the  smallest  number  for  which  (2.14) 
holds,  from  which  (2.13)  follows.  A trivial  corollary  of  (2.11)  and  (2.14) 
is  that  11*11  is  consistent: 

IICDII  < IICIItIDII  . 

Finally  we  observe  that 

(2.15)  Vx  ||Cx||2  < ||Dx||2  =^I|C||  < ||D||. 

To  prove  this  implication  note  that  by  (2.3)  the  hypothesis  implies  that 
o^(C)  < ct^(D) . Hence  the  inequality  ||C||  < ||D||  follows  from  (2.9). 


In  the  sequel  ||*||  will  always  refer  to  a uniformly  generated, 
normalized,  unitarily  invariant  norm. 
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0 i 


Perturbation  of  matrix  inverses.  We  shall  later  need  some  results 
on  the  inverses  of  perturbations  of  nonsingular  matrices.  These  are 
sunmarized  in  the  following  theorem. 

Theorem  2.2.  Let  A be  nonsingular  and  suppose  that 

IIA-1 1| 2 l|E||  < 1 . 

Then  A ♦ E is  nonsingular, 


IKA+E) 


-1||  < M 


-1, 


r ’ 


and 

(2.16) 

where 

(2.17) 

and 


|(A+E)~1-A~1II  _ k 


IIA 


, _ ' E|| 

-TJ]  r W 


< = IIAIIHA 


-1, 


In  most  applications  of  Theorem  2.2  the  number  y is  insignificantly 
different  from  unity.  The  number  < is  usually  called  the  condition  number 
of  A (with  respect  to  inversion).  It  measures  the  sensitivity  of  the 
inverse  of  A to  perturbations  in  A.  Similarly  defined  quantities  will 
play  similar  roles  in  the  perturbation  theory  for  the  pseudo- inverse. 


a 
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Projections.  We  have  already  observed  that  the  orthogonal  projections 
PA  and  Ra  onto  the  column  space  and  the  row  space  of  A can  be  expressed 
in  terms  of  the  pseudo- inverse.  The  projection  onto  RCA)"1-  will  be  denoted 
by 


Likewise 

= 1 - R. 

A A 

H 

will  denote  the  projection  onto  R(A ) . 

When  A is  in  reduced  form,  its  projections  can  be  easily  written  out: 


Xr  0 


PA  = 


€ CmXm  , 


It  follows  that 


and 


Ir  0 


R* 


cnxn  . 


\o 


||PaARa||  = ||Anl| 


HPAERAII  = IIEUII  , IIPAERA||  = ||E12ii 


<eraii  - ||E21||  , iipXii  = IIE: 


22" 
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These  identities  enable  us  to  pass  from  results  for  the  reduced  form  to 
general  results  stated  in  terms  of  projections  of  A and  E. 

We  shall  need  some  properties  of  norms  of  projections  later.  These 
are  sunmarized  in  the  following  theorem. 

Theorem  2.3.  For  any  A and  B the  following  statements  are  true. 

1.  If  rank  (A)  * rank  (B),  then  the  singular  values  of  PAP^"  and 
PBPA  are  the  same  so  that 

“PAPB»  ' »PBP>  ' 

Moreover  the  nonzero  singular  values  a2  of  PAPg  correspond 
to  pairs  ±a  of  eigenvalues  of  Pg  - PA,  so  that 

^PB*PA^2  = ^^2  “ ^8^2  * 

2.  If  ||PB-PAI!  2 < 1,  then  rank  (A)  = rank  (B). 

3.  If  rank  (B)  > rank  (A),  then 

»PBIA»  - <PA»  • 

Proof.  Proofs  of  parts  one  and  two  are  readily  found  in  the  literature. 
For  part  three  write  Pg  - Pj  ♦ P2  where  rank  (P^  «=  rank  (A)  and  ?ff2  = 0 
(i.e.  R(P2)  is  orthogonal  to  R(A)).  Then 

^A^11  " l!PA<I-PrP2>ll  " HPA(I-P1^I  * Hpipill  . 

the  last  equality  following  from  part  1.  Now  for  any  x 
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lIPjI&ll  s l|PBP^x||  , 

and  the  result  follows  from  (2.15).  □ 

When  B - A ♦ E,  we  can  estimate  UPgP^II  in  terms  of  E. 

Theorem  2.4.  The  product  PgPA  can  be  written  in  the  form 

(2.18)  PBPA  = * 

Hence 

(2.19)  HPgpJl  5 ll^l|E||  , 
and  if  rank  (A)  = rank  (B) , then 

(2.20)  UPgP^II  < min  {||B+|^||A+|^}  ||E||  . 

Proof.  We  have 

Va  - 

- (B+)H(A+E)HP^  = (B+)HEHP^ 

- (B+)HBH(B+)HEHP^  = (B+)HRbEHP^  , 

which  establishes  (2.18).  The  inequality  (2.19)  follows  upon  taking  norms 
in  (2.18).  Finally  (2.20)  follows  from  part  1 of  Theorem  2.3.  a 

Theorems  2.3  and  2.4  have  obvious  analogues  for  other  combinations  of 
projectors  (e.g.  K^R-A  ■ -AfEK^).  In  the  sequel  a reference  to  these  theorems 
will  also  cover  any  trivial  variants. 
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The  case  when  llpB'pAll2  < * ^ particularly  important  later. 

We  have  seen  in  part  2 of  Theorem  2.3  that  in  this  case  rank  (A)  = rank  (B). 

However  more  is  true:  no  vector  in  R(A)  can  be  orthogonal  to  R(B)  and  vice 

versa.  For  suppose  that  x + 0 satisfies  PAx  * x and  PgX  = 0.  Then 
(pB‘pA)x  ■ -x.  which  implies  that  llpB-pAll2  2 1.  Conversely  if  llpB-pAH2  * 1 
then  there  is  a vector  in  R(A)  or  K(B)  that  is  orthogonal  to  R(B)  or 
R(A).  To  see  this,  note  that  by  Theoran  2.3.1  there  is  a vector  x 
such  that  (pB-pA)x  * x.  If  PAx  - 0 then  PgX  ■ x which  shows  that  x ( RfB) 
and  x € R(A)‘L.  If,  on  the  other  hand,  PAx  + 0,  then  since  PAx  = -(I-Pg)x 

we  have  pB(pAx)  ■ 0,  which  shows  that  PAx  € R(A)  and  PAx  € RfB)1*. 

Because  of  the  above  considerations,  we  shall  say  that  R(A)  and  R(B) 
are  acute  whenever  llpB-pAll2  < 1.  The  following  theorem  gives  sufficient 
conditions  for  R(A)  and  R(B)  to  be  acute. 

Theorem  2.5.  If  rank  (A)  ■ rank  (B)  and 


Wa\\\pabRaW2  < l , 

then  R(A)  and  R(B)  are  acute. 

Proof.  We  shall  use  the  reduced  form.  From  Theorem  2.2  it  follows 
that  Bjj  ■ Ajj  ♦ Ejj  nonsingular.  Hence 


rank 


/ B \ 

/ 11 

Wl 

rank  (A)  “ rank  (B)  . 


It  follows  that 
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But 


from  which  it  is  easily  seen  that  no  vector  in  R(A)  can  be  orthogonal  to 
R(B)  and  vice  versa,  o 


Theorem  2.4  shows  that  if  rank  (A)  * rank  (B)  the  spaces  R(A)  and 
R(B)  are  acute  whenever  E is  sufficiently  small.  For  this  reason  we  shall 
say  that  B is  an  acute  perturbation  of  A if  A and  B satisfy  the 
hypotheses  of  Theorem  2.4.  The  reader  should  remember  that  the  statement 
"B  is  an  acute  perturbation  of  A"  is  stronger  than  the  statement  "R(B)  and 
R(A)  are  acute." 

Notes  and  References.  The  properties  of  singular  values  are  well  known. 
See  Stewart  (1973)  for  an  introduction  and  Gohberg  and  Krein  (1965)  for  a 
more  detailed  treatment  in  an  infinite  dimensional  setting. 

Von  Neunann  (1937)  was  the  first  to  prove  that  unitarily  invariant 
norms  can  be  written  as  a function  of  singular  values  (the  function  cpm  n 
in  (2.7)  is  usually  called  a symmetric  gauge  function).  Systematic  treat- 
ments of  unitarily  invariant  norms  may  be  found  in  Mirsky  (1960)  and  Gohberg 
and  Krein  (1965). 
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The  treatment  of  unitar ily  invar iant  norms  in  finite  dimensional 
spaces  has  often  been  a little  sloppy.  In  infinite  dimensional 
settings  there  is  usually  only  one  space  and  one  generating  function,  and 
the  same  is  true  in  a finite  dimensional  setting  when  one  is  concerned  with 
square  matrices.  However,  when  one  considers  rectangular  matrices  with 
varying  dimensions,  different  norms  can  be  used  for  different  dimensions, 
and  there  is  no  reason  why  these  norms  should  interact  nicely.  How  bad 
things  can  get  is  illustrated  by  the  family  or  norms  ||  ||  defined  for 
A € ^ by 

HAH  - £ l|A||2  . 

II 

This  family  is  unitarily  invariant  and  consistent,  but  ||A  ||  / ||A||,  unless 
A is  square,  and  the  relation  (2.15)  does  not  hold  in  general.  Definition 
2.1  represents  a return  to  the  simplicity  of  the  infinite  dimensional  case. 

Theorem  2.2  is  classical  and  is  usually  proved  by  an  appeal  to  the 

-1  2 

Neumann  series  representation  (I-A)  - I ♦ A ♦ A ♦ . . . . Wilkinson  (1965) 

gives  a proof  that  does  not  use  series  and  discusses  at  some  length  the 
notion  of  condition  number.  The  result  is  usually  proved  under  the  assumption 
that  || 1 1|  * 1;  however,  the  proofs  can  be  extended  to  establish  the  result  for 
any  consistent  norm. 

The  results  in  Theorem  2.3  are  well  known  to  people  who  work  closely 
with  orthogonal  projectors  (for  proofs  see  Afriat  (1957)  or  Wedin  (1969)). 

The  decomposition  in  Theorem  2.3  was  established  in  a slightly  weaker  form 
by  Wedin  (1973).  In  some  cases,  when  E is  small,  Rg  will  be  near 
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and  the  approximation  ||P^ERg||  HE^^II  will  be  more  realistic  in  (2.19). 

The  number  IIPb‘pa^2  *s  close^y  related  to  various  measures  of  separa- 
tion between  subspaces.  See  Kato  (1966)  and  especially  Davis  and  Kahan 
(1970)  where  further  references  may  be  found.  Theorem  2.4,  with  ||P^ER^|| 
replaced  by  ||E||,  is  proved  by  Wedin  (1973).  The  term  "acute"  ordinarily 
refers  to  the  angle  subtended  by  two  line  segments,  not  to  the  segments  them- 
selves, and  it  is  technically  misapplied  when  subspaces  are  said  to  the 
acute.  But  this  usage  will  cause  no  confusion  and  it  is  better  than  the  ugly 
phrase  "in  the  acute  case."  The  term  "acute  perturbation"  is  new. 

3.  The  Pseudo- Inverse 

In  this  section  we  shall  consider  the  problem  of  bounding  ||B+-  A+|| 
in  terms  of  ||E||.  We  shall  obtain  three  basic  theorems:  one  for  when 

rank  (A)  f rank  (B),  one  for  when  rank  (A)  = rank  (B),  and  one  for  when 
B is  an  acute  perturbation  of  A.  All  these  theorems  are  based  on  expres- 
sions for  B+,  which  also  yield  asymptotic  expressions  for  B and  expres- 
sions for  the  derivative  of  Af. 

Lower  bounds.  Before  proceeding  to  obtain  bounds  on  ||B+-  A+||,  we 
shall  show  how  bad  things  can  be  by  deriving  lower  bounds. 

Theorem  3.1.  If  R(A)  and  R(B)  are  not  acute,  then 
(3.1)  hb+-a+ii2  * 1/||E||2  . 

If  further  rank  (B)  > rank  (A) , then 
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(3.2)  ||B+H2  > 1/||E|!2  . 

Proof.  Suppose  for  definiteness  that  rank  (B)  > rank  (A).  Then 
there  is  a vector  y € R(B)  with  ||y||2  = 1 such  that  y € R(A)a.  Thus 

1 - A * ABy  - Ae+y  - Aa+e)bV 

* yVy  < ||E||2llB+yll2  . 

which  shows  that  ||B+y|| 2 • and  hence  ||B+Il2  is  not  less  than  1/ ||E|| 2 . From 

■j* 

this  and  the  fact  that  A y * A P^y  - 0 we  have 

ygq  5 l(B+y|| 2 5 II  (B+-A+)y|l2  < ||B+-A+||.  o 

Theorem  3.1  shows  that  the  pseudo- inverse  of  a general  matrix  is  not 
a continuous  function  of  its  elements,  unless  the  class  of  perturbations 
is  restricted.  It  also  says  that  if  two  nearby  matrices  do  not  have  acute 
column  spaces,  then  one  of  them  at  least  must  have  a large  pseudo- inverse. 
Moreover  if  they  are  of  the  same  rank,  then  both  of  them  must  have  large 
pseudo - inverses . 

A decomposition  of  B+  - A+.  In  spite  of  the  negative  results  in 

4*  + 

Theorem  3.1,  it  is  possible  to  obtain  bounds  on  II B -A  ||  in  the  general  case, 
although  these  bounds  need  not  remain  finite  as  B approaches  A.  The 
basis  for  obtaining  such  bounds  is  contained  in  the  following  theorem. 

Theorem  3.2.  The  following  two  decompositions  of  B+  - A+  are  valid: 
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(3.3)  B+  - A+  = -B+PgERAA+  + B+PgPA  - RjRAA+  , 

(3.4)  B+  - A+  = -B+PbERaA+  + (BHB)+RgEHPj  - R^EHPa(AAH)+  . 

Proof.  Both  expressions  can  be  verified  directly  by  replacing  E 
with  B - A,  replacing  the  projectors  by  their  expressions  in  terms  of 
pseudo- inverses,  and  simplifying.  □ 

It  should  be  noted  that  (3.3)  can  be  obtained  directly  from  (3.2) 
by  using  Theorem  2.4  to  express  PgPA  and  RgRA  in  terms  of  E. 

The  general  theorem.  We  are  now  in  a position  to  prove  the  general 
theorem  bounding  ||B+-A+||. 

Theorem  3.3.  For  any  A and  B with  B * A + E, 

||B+-A+||  < n max  {||A+||,  ||B+|||>  ||E||  , 

where  is  given  in  the  following  table: 


Ml 

arbitrary 

spectral 

Frobenius 

4 

3 

lVS 

VI 

Proof.  The  proof  is  a slight  modification  of  the  proof  given  by  Wedin 
(1973).  We  shall  give  only  the  proof  for  the  Frobenius  norm. 

Suppose  for  definiteness  rank  (B)  s rank  (A).  Let  F^^,  F3 
denote  the  three  terms  on  the  right-hand  side  of  (3.3).  Then  the  column 
spaces  of  F^  and  are  orthogonal  to  the  column  space  of  Fj.  Hence 


22  - 


(3.5)  l|B+-A+||p  = HFj+F^lp  * ||F3I||  . 

Now  since  Fj  ♦ F2  = B+ (PgDA+PA+PgPA) , 

||F1+F2llp  < !|B^(||PgEA+PAl|p+||PgPA||p)  . 

But  from  Theorems  2.4  and  2.5 

iipbea+paii2  ♦ iipbp;«p 

s ||PbEA^  . l!PjPAllp 
= IIPgEA^  + ||PgEA+|| 

= IIEA^I  < ||E|!p||A+|| I • 


Hence 


(3.6) 


||F1+F2IIf  5 l|A+||2l|B+||2||Ef|p 


Also  from  Theorem  2.5 


(3.7) 


IIFjllp  3 IIA+|I2I!RJRaI!f  - ||At||2l|RARj||p 
- ||A+|l2l|A+ERj||F  < ||A+!^||E||p  , 


and  the  result  follows  on  combining  (3.3),  (3.6),  and  (3.7).  Since  the 

final  bound  is  synmetric  in  A and  B,  it  also  holds  when  rank  (B)  > rank  (A).  □ 

4.  4. 

It  should  be  noted  that  these  bounds  do  not  imply  that  i|B  -A  II  is  small 
when  ||E||  is  small,  since  B+  may  grow  unboundedly  as  E approaches  zero. 
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The  case  rank  (A)  = rank  (B).  When  A and  B have  the  same  rank, 

we  can  strengthen  Theorem  3.3  in  two  ways.  First,  we  can  replace  the  term 
+ 2 +2 

max  { ||A  H^.llB  ||2>  with  the  product  ||A||2I|B||2.  Second  we  can  distinguish 
more  cases  for  the  constant  4.  In  the  following  theorem  recall  that 
A ( CmXn  with  m > n. 


Theorem  3.4.  If  rank  (A)  = rank  (B) , then 
(3.8)  ||B+-A+||  < 4llA+H2l|B+||2l|E||  , 


where  4 is  given  in  the  following  table. 

Arbitrary Spectral Frobenius 

rank  (A)  < min(m,n)  3 (1+  VZ)/2  V2 

rank  (A)  = min(m,n) 

m + n 2 \fl  1 

rank  (A)  = m = n 1 1 1 

The  proof  of  this  theorem  may  be  found  in  Wedin  (1973).  The  bound  (3.8) 
may  be  recast  in  the  form 


(3.9) 


where 


IIB^A1-!! 

iib+ii2 


9 


< = IIA||||A+||2  . 

In  this  form  the  result  is  almost  analogous  to  the  bound  (2.16)  for  the 
inverse  in  Theorem  2.2.  The  bound  (3.9)  also  implies  that  as  E approaches 


r 
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zero,  the  relative  error  in  B+  approaches  zero,  which  further  implies 
that  B+  approaches  A+.  Remembering,  on  the  other  hand,  that  if 
rank  (B)  t rank  (A)  then  R(A)  and  R(B)  cannot  be  acute,  we  have  from 
Theorem  3.1  the  following  corollary  of  Theorem  3.4. 

Corollary  3.5.  A necessary  and  sufficient  condition  that 

lim  B+  = A+ 

B-A 

is  that  rank  (B)  = rank  (A)  as  B approaches  A. 


Acute  Perturbations.  It  is  evident  from  the  proofs  of  Theorems  3.3 
and  3.4  that  we  have  given  away  much  in  deriving  the  bounds.  In  particular, 
if  B is  a small  acute  perturbation  of  A then  P A and  PR  are  nearly 
equal,  and  the  same  is  true  of  R^  and  Rg.  Thus  it  follows  from  (3.4) 
that  B - A can  be  decomposed  into  three  terms--one  essentially  depending 
on  P^ER^ , one  on  P^ER^,  and  one  on  P^ER^ . However,  this  does  not  tell  the 
whole  story;  for  we  shall  show  that  the  dependency  of  B+  - A on  P^ER^ 
and  P^ER^  is  bounded,  no  matter  how  large  these  projections  may  be. 

In  order  to  state  our  theorems  concisely,  we  must  first  introduce  some 
additional  notation.  Let  II* II  be  generated  by  <p  and  for  any  F ( C^*1" 

(k  > r)  define 


VF) 


OjCF) 


L (F) ] 1/2 


ar(F) 


[l+aJ(F)]1/: 


(3.10) 


» • • • 9 
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The  function  ^ is  not  a norm;  however,  it  has  some  useful  properties. 
First,  from  (2.5)  and  the  monotonicity  of  <p, 

t(GF)  < t(l!G||2F)  < t(IIGIIF)  . 

Second,  since  for  a > 1 


a a 


(l+aa2)1/Z  (1 +ct2)1/2 


a a 

T, 


we  have 


a > 1 t^(aF)  5 (F) 


For  small  F,  ^(F)  is  asymptotic  to  ||F|I: 


^(F)  = ||F||  + o(HFII) 


For  large  F,  t is  bounded: 


^(F)  5 |IIr|| 


Finally,  for  the  spectral  norm 


*2(F) 


IIF|I2 

(1+||F!|2)1/2 


Our  first  result  concerns  a rather  special  matrix. 


Lemma  3.6.  The  matrix 


I 

F 
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I 


satisfies 

(3.11) 

and 

(3.12) 


Proof. 

(3.13) 


It  is  easily  verified  that 


(^j  = (I+FHF)'1(I  f“)  , 


whose  singular  values  are 


2 1 ~ 1 ’ 

[l+aJ(F)r 

from  which  (3.11)  follows.  Also  if 


then 


GG*1  « I - (I+F^)"1 


It  follows  that  the  singular  values  of  G are  given  by 

aA(F) 

(l-o.  (F)p7 


which  establishes  (3.12).  o 


7 


- 27  - 

The  main  result  is  based  on  an  explicit  representation  of 
shall  work  with  the  reduced  forms  of  A and  B. 

Theorem  3.7.  Let  B be  an  acute  perturbation  of  A.  Then 


(3.14) 

where 


B+  - (I  F,2)+Bn 


■a  ■ 


F = F B F = 

21  C21°ll  » r12  B11L12 


Proof.  As  in  the  proof  of  Theorem  3.4,  we  have 

R(B)  = r| 


Thus  the  columns  of 


€;:)] 
(£) 


can  be  expressed  as  a linear  combination  of  the  columns  of 


(i;;) 


Since  ^B11E12^  * E12*  we  11,1151  ^iave 


. We 


from  which  it  follows  that 
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(3.15) 


fi2)  • 


The  result  now  follows  from  Penrose's  conditions,  c. 


It  is  interesting  to  observe  that,  from  (3.15), 

B22  " n22  * r21BllF12  ’ 

which  is  of  second  order  in  T . In  other  words,  if  rani  '• 
then  P*r.R*  must  approach  zero  quadrat ically  as  ! appro. i ■- 
We  turn  now  to  the  perturbation  theorem. 


Theorem  3.8.  Let  B be  an  acute  perturbation  of  A.  let 


k » ’!A!:  "A'  II, 


and  let 


ilEil'!  A 

Y = 1 - < ! 1 ^ " — > 0 


Then 


and 

(3.16) 


II B 


l|A‘ 

Y 


l»V»  . « "Hi  . ,(*  " V \.,(*  rn  \ 

Y ~TK~  *»\7  TSU  ) *+.\7T-/  ' 


where  \|»  is  defined  by  (3.10). 


Proof.  Let  be  defined  as  in  Theorem  3.7.  Let 


ran! 


It  follows  from  Theorem  2.2  that 


i-l, 


-1  -1  llA-iiH  IIa^II 

IIBjJll  - IKAjj-Ejj)  1|| 

t -1  + 

From  (3.14),  B = ^12®11  J21’  ^rQm  Lenma  3,6 


iib+ii  s iu12ii2iib"Jiihj|1ii2  s iib’Jh  < m!i 


Now  from  (3.14) 


(3.17) 

D+  A+  r T+  T+  N « “ 1 y f a.  T+  A “ 1 /■  y"f* 

B 'A  (J12"I12)A11I21  J12A11^J21 

From  Theorem 

2.2  we  have  the  following  bound: 

(3.18) 

t -1  -1  -1  K llEllH 

»J12»l}-All)>  5 »An!l2  r wnr 

By  Lemma  3.6 

(3.19) 

|l(Jn"I12)AnI21|l  s »Ail»2l|J12-I1211 

l|All|l2'1'<p^BllE12) 

-1  k l|E1211 

^l'^Vy  "W5  * 


< 
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f 

and  likewise 


(3.20) 


IIJ12AilfJ2l'I21)l1  5 


iie21ii 

HaT 


■)  • 


The  bound  (3.16)  follows  on  combining  (3.17),  (3.18),  (3.19),  and  (3.20) 
and  remembering  that  llAj j||  = IJA^Ii.  o 


The  bound  (3.16)  gives  a rather  nice  dissection  of  B -A  . Asympto- 
ticallv,  for  F.  small,  it  reduces  to  the  bound  that  would  be  obtained  by 
taking  norms  in  (3.4);  i.e.. 


i|Bt-Atl: 

IIA1 


K HE11IKI|E12«*I|E21II 

7 w 


However,  the  bound  additionally  shows  that  Ej  ^ and  E^  can  have  at  most 
a bounded  effect  on  B -A  ||. 

When  A is  square  and  nonsingular,  Ep  and  E^^  are  void,  and  the 
bound  reduces  to  that  of  Theorem  2.2.  Note  that  the  number  k,  defined 
in  analogy  with  (2.17),  plays  an  analogous  role  here. 


Asymptotic  forms  and  derivatives.  Asymptotic  forms  for  B may  be 
obtained  from  either  (3.4)  or  (3.14).  Of  course  for  B+  to  approach  A 
we  must  have  rank  (A)  ■ rank  (B);  and  since  we  are  assuming  that  E is 
arbitrarily  small,  by  Theorem  2.5  we  have  that  B is  an  acute  perturbation 
of  A.  In  this  case 


Bf  - A+  ♦ O(IIFJI)  , 


and 
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PB  = BB+  = (A+E)[A++0(||E||)]  = PA  + OCIIEII) 


with  similar  expressions  for  the  other  projections.  Hence  from  (3.4) 


Bf  - A+  - A+P,ER,Af  +(AHA)tR.EflP 


A A 


(3.21) 


- R^EHPa(AAH)+  + OdlEll2)  . 


It  follows  immediately  from  (3.21)  that  if  A(t)  is  a differentiable 


function  of  t with 


rank  [A(t)]  = rank  [A(t')] 


for  all  t,  then  A (t)  is  a differentiable  function  of  t and 


(3.22) 


-A^P  ^ R A^  + fA^AI^R  ^ pA 
A *A  37  V AJ  RA  fA 


* RA  PA 


The  asymptotic  form  obtained  from  (3.14)  can  be  useful  computationally 

4. 

when  A has  been  put  in  reduced  form  as  a preliminary  to  computing  A . 

We  have  from  (3.21)  that 

Bu  ■ An  • AikiAu  * • 


From  (3.13)  in  the  proof  of  Lemma  3.6  we  have 


d A^)  ♦ 0(!IE11I|(|E21I!) 
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and 


a f12j 


-h  + o(||e11iiiie12io 


11 


Hence  fran  (3.14) 


(3.23)  Bt  = 


Aii  * AiiEiiAii  + °(i|Enli2)  (aiiah)1e2i+  OCIIEiji'  lEzi") 


E12CA11A11:>1  + °(llE17HilEi,'l)  ^(A^Ai.A^)’1^ 


12  12 y -i2^inni^  21 

+ 0(||E11|l2||F12l|||E21ll) 


This  expression  is  in  perfect  agreement  with  (3.21)  when  the  E^j  are 
interpreted  appropriately  as  projections  of  E. 


Notes  and  references.  For  expository  reasons  the  results  of  this 
section  have  not  been  presented  in  the  historical  order  of  their  development. 
Penrose  (1955)  established  Corollary  3.4  using  techniques  that  do  not  give 
explicit  perturbation  bounds.  The  subject  was  revived  by  Golub  and  Wilkin- 
son (1966),  whose  interest  in  stable  algorithms  for  solving  least  squares 
problems  [cf.  Golub  (1965)]  led  them  to  derive  first-order  perturbation 
bounds  for  least  squares  solutions  (more  of  this  later).  The  first  pertur- 
bation bounds  for  the  pseudo- inverse  itself  are  given  by  Ben-Israel  (1966), 
who  restricts  his  class  of  perturbations  so  that  (in  reduced  form  ) only 
E^  is  nonzero.  More  general  theorems  for  acute  perturbations  were  estab- 
lished by  Hanson  and  Lawson  (19^9),  Pereyra  (1969),  and  Stewart  (1969). 
Theorem  3.7  is  a refinement  and  extension  of  Stewart's  bound. 


a 


J 
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The  decompositions  (3.3)  and  (3.4)  and  the  consequent  Theorem  3.4 
are  due  to  Wedin  (1973).  Theorem  3.3  is  a slight  extension  of  these 
results.  Theorem  3.1  is  also  due  to  Wedin  (1973),  although  a slightly 
restricted  form  of  the  result  may  be  found  in  Stewart  (1969).  In  an 
earlier  report  Wedin  (1969)  considers  the  sharpness  of  the  constants  q. 
in  Theorem  3.4  and  shows  that  for  the  spectral  norm  q cannot  be  made 
smaller. 

Early  differentiability  results  have  been  given  by  Pavel-Parvu  and 
Korganoff  (1969)  and  Hearon  and  Evans  (1968).  Wedin  (1969)  derived  the 
formula  (3.22)  as  we  did  from  the  decomposition  (3.4).  The  same  result 
for  functions  of  several  variables  was  derived  independently  by  Golub 
and  Pereyra  (1973)  in  connection  with  separable  nonlinear  least  squares 
problems.  For  further  references  see  Golub  and  Pereyra  (1975). 

4.  Projections 

In  this  section  we  shall  consider  how  the  projection  P^  varies  with 
A.  Since  P^  = AA+,  it  might  be  thought  that  the  perturbation  theory  for 
PA  could  be  derived  from  the  theory  developed  in  the  last  section 
for  a"*".  However  this  approach  gives  too  much  away,  and  sharper  bounds 
may  be  obtained  by  working  directly  with  one  of  the  decompositions  of  B . 
In  particular  we  shall  work  with  the  decomposition  (3.15)  based  on  the 
reduced  forms  of  A and  B. 

If  R(A)  and  R(B)  aTe  not  acute,  then  hPg-PAh 2 * 1*  Consequently 
we  can  restrict  ourselves  to  the  case  where  R(A)  and  R(B)  are  acute. 
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More  particularly  we  shall  only  consider  the  case  where  B is  an  acute 
perturbation  of  A. 

Theorem  4.1.  Let  B be  an  acute  perturbation  of  A,  and  let  < 
and  r be  defined  as  in  Theorem  3.8.  Then 


HE 


21  2 


(4.1) 


llPB'PA^2  5 


Y IIAII 


V HAH  2 / _ 


< 1 . 


Proof.  With  F21  defined  as  in  the  last  section  we  have  [cf.  (3. IS)] 


R(B)  - R 


(U 


The  matrix 


C21)ci*^iFa)’1(i  ^ 


is  a Hermitian  idempotent  whose  column  space  is  R(B);  hence  it  is  Pg. 
It  follows  that 


(4.2)  PB  - PA 


*I+F21F2p  ‘ 1 (I^F21F21^  lp21 

F21(I+F21F21)1  F21 ^l4F21F21^  *lp21 


from  which  it  is  easily  verified  that 
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(4.3)  (pb-pav 


^21F21(I+F21F21) 


\ 


F21(I+F21F21)  lp21 


Now  the  nonzero  singular  values  of  the  diagonal  blocks  in  (4.3)  are  given  by 

T 


1 * oJ(F21) 


where  the  o^F^)  are  t^ie  nonzero  singular  values  of  result 

follows  from  the  fact  that  the  largest  singular  value  Oj  of  F21  satisfies 

< "E21,(2 

Oi(f2i)  - Hf21II2  • □ 


Ill'll  9 


In  terms  of  projections,  the  bound  (4.1)  can  be  written  in  the  form 


< K™Ah 


5 r 

. 1 K Wi  \2  ‘ 

1 

\ Y ‘TO7  / J 

T/2 


1 . 


The  bound  is  interesting  in  several  ways.  First  it  depends  not  at  all  on 
F.12  and  E22‘  Second  its  dependence  on  is  only  through  the  constant 

y (in  fact  the  term  k/y  can  be  replaced  throughout  by  HB^II^IAI^) . Third 
the  bound  is  always  less  than  unity.  Finally,  it  goes  to  zero  along  with 
E2j.  We  may  summarize  this  last  observation  in  the  following  corollary. 

Corollary  4.2.  Regarding  B as  variable,  a sufficient  condition  for 


lim  PD  * P. 
B-*A  B A 
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is  that  rank  (A) 


rank  (B),  ||A+||2I|PABRAH2  < 6 < 1,  and 


lim  PtBR.  = 0 . 
B-*A  A A 


Asymptotic  forms  and  derivatives.  Asymptotic  forms  may  be  obtained 
in  the  usual  way  from  (4.2).  Indeed 

I 0(||E21l!2)  * 0(||E21II3) 

PB‘PA= 

\ F2i  + OCHE^i11  ) 0(||E21ir) 

In  terms  of  projections 


It  follows  that 


then  PAft)  is 


PB  ' PA  * P1ERAA+  * AAl  * °CIIPiERAl|2)  . 

if  A(t)  is  differentiable  and  varies  without 
differentiable  and 


changing  rank, 


(4.4) 


^A 

IF 


PA  37 


^ R.A+ 


A+\ 


dAH 

■JT 


Notes  and  references.  Theorem  4.1  and  its  corollary  appear  to  be  new. 
The  expression  (4.4)  for  the  derivative  of  PA  was  first  given  by  Golub  and 
Pereyra  (1973). 

5.  The  Linear  Least  Squares  Problem 

In  this  section  we  shall  derive  perturbation  bounds  for  the  least 
squares  problem  of  minimizing  ||b-Ax|| 2-  Although  the  solution  of  minimum 
norm  is  given  by  x - Afb,  the  perturbation  theory  of  §3  again  does  not  Rive 
the  best  possible  results. 
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We  shall  assume  throughout  this  section  that  B is  an  acute  pertur- 
bation of  A,  and  we  shall  work  with  the  reduced  form  of  the  problem. 

In  this  form  x is  replaced  by  V^x  and  b is  replaced  by  t/\>  (cf.  52). 
If  x and  b are  partitioned  in  the  forms 


where  x^.b^  € Cr  then 

(5.1)  = Ajjb2 

and 


Moreover  the  norm  of  the  residual  vector 

r = b - Ax 


is  given  by 

llr|l2  « ||b2ll2. 

In  the  theorems  to  follow  we  shall  freely  use  the  definitions  made 
in  the  previous  sections  [e.g.,  < and  y] . One  additional  piece  of  nota- 
tion will  be  needed;  namely,  we  shall  define  p as  that  nonnegative  constant 
such  that 

llbjllj  - t)||A||2I|x||2  . 


Mi 
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Since  = A^x,,  we  have  i]  < 1,  Also  ||xl|  5 l|A+|| ||b1 1| , which  shows  that 

-1  t 

t)  2 k . When  A is  ill-conditioned,  that  is  when  A is  large,  the 

vector  x may  be  either  large  or  small.  In  the  first  case  r)  is  near 

zero,  and  we  shall  say  that  "x  reflects  the  ill-condition  of  A." 

We  first  consider  perturbations  in  the  vector  b. 

4-  4- 

Theorem  5.1.  Let  x = A b and  x + h = A1 (b+k).  Then 

IlhIL  IIP.klL 

(5-2)  wr2~ K • 

Proof.  With  the  obvious  partitioning  of  k we  have  h = Ajjk^,  so  that 
(5.3)  ||h|l2  £ llAjJlIUkjll  . 

But  ||x|| £ = 'H  ^"llb^ll 2/ IIA|| 2 > which  combined  with  (5.3)  yields  (5.2).  □ 

Theorem  5.1  shows  that  the  perturbation  in  x is  determined  by  the 
projection  of  k onto  R(A).  However,  P^k  is  normalized  by  !|P^blU, 
and  if  this  latter  quantity  is  small,  the  perturbation  may  be  large.  Since 

llbll  2 = 'IPAb(f|  ♦ llrfl  2 , 

this  observation  may  be  sunmarized  by  saying  that  large  residuals 
are  troublesome,  a statement  which  will  be  amply  supported  later. 

Since  r)  can  be  as  small  as  < \ the  number  < cannot  be  taken 
as  a condition  number  for  perturbations  in  b without  further  qualification. 
If  x does  not  reflect  the  ill-conditioning  of  A,  then  r)  is  near  unity 
and  < is  a condition  number.  Otherwise  the  solution  will  be  relatively 
insensitive  to  perturbations  in  b. 
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We  next  turn  to  assessing  the  effects  on  x of  a perturbation  in  A. 

Theorem  5.2.  Let  x = A+b  and  x + h = BEb,  where  B = A + E is  an 
acute  perturbation  of  A.  Then 


(5.4) 


HE, 


HE, 


2 ||E, 


1 2 k "*11"2  . , /K  i|1j12m2  \ . !|t21!l2  / :b2ll2  ^ 'E2l  ’’ 

Hxirj  - y 


v ^ 


Proof.  Write 


(5.5) 
Then 

(5.6) 
and 

(5.7) 


h = Ji2^Bn'An^bi  + fJ12'I12')Allbl  + J12Bll^J2l"I21-)b 


HE, 


!l J1 2 (Bn  A1 1 ) bl ! 2 5 7 l|AH 22  llxll2  ’ 


t t -1  / < F1 2^  2 \ 

(Jl2-Ii2)Anbi  2 5 ^(y  II All 2 ) 11X11 


Now 


(5.8) 


J12Bll(J2rI21)b  = J12B11^I+F21F21) 


+ j12Bii(I+F21F21)  F21b2  * 


To  bound  the  first  term  in  (5.8),  note 


that  (I*F,2IiF21)-1  - I ■ 


Hence 


r j 


40  - 


(5.9) 


IIJIzBn[<I*F^iF2i>-I>bi»z 

= ||B;}ll2ll(I.^1F21)-1|l2ll^1ll2l|F21b1l|2 


- IIBnll2l|F2i¥E2iBiibill2 


-12  2 rJ|E21B2l2 

= IIB1ill||[E21l[|]|x|l2  = »' 


For  the  second  term  in  (5.8)  we  have 


(5.10) 


IIJ21B;JtI*F21F21>'lF21b2»2 

5 ||B‘J||2!lE21ll!|b2l| 

- l|B:j|l2|!E.1IU  !!b2!l2  i|x||2 

11  2 21  2 HOT11  WT 


5 ^ I ^ 


($) 


12 

2 !IE21II2  llb2 


OT^“TE]T  i!X  2 


The  bound  (5.4)  follows  on  combining  (5. 5)- (5. 10) . o 


The  first  two  terms  in  (5.4)  are  unexceptionable.  The  first  term 
corresponds  to  the  classical  result  for  linear  systems  and  is  the  only 
nonzero  term  when  A is  square  and  nonsingular.  The  second  term  depends 
on  P^ER^  and  vanishes  when  A is  of  full  column  rank,  as  it  is  in  many 
applications. 

The  third  term  requires  more  explanation.  If  terms  of  second  order 
in  ,|h21l!  are  ignored,  this  expression  becomes  essentially 


1|b2,,2  l|L21!,2  k*  "E21H2 

wjr2  “W7‘7Tltaneiw7  * 


(5.11) 
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2 2 

where  0 is  the  angle  subtended  by  b and  R(A) . The  number  k p tan  e/y 
can  vary  from  0 to  It  is  small  when  0 is  small  (i.e.  the  residual 
vector  is  small).  It  is  also  reduced  in  size  when  x reflects  the  ill- 
conditioning  of  A so  that  r|  s * when  x does  not  reflect  the  ill- 

7 

conditioning  of  A and  0 is  significant,  it  is  of  order  k , thus  making 
the  third  term  in  (5.4)  the  dominant  one. 

We  have  bounded  the  third  term  in  the  decomposition  (5.5)  in  such  a 
way  as  to  reflect  its  behavior  when  is  small.  In  fact  it  is  bounded 

for  all  values  of  E^,  and  the  third  term  in  (5.4)  may  be  replaced  by 

k l|b|l2  . /k1|F'21,!2\ 

yr]W^2 

The  residual.  Since  the  residual  vector  is  given  by  r = F^b,  the  theory 
of  §4  may  be  applied  to  give  perturbation  bounds  for  the  residual.  Specifi- 
cally, if 

i - B+b 
and 

? = b - Bx  = Pgb  , 

then 

||f -r|| 2 < ||PB-PAl!2l!bl|2 


and  IIPg-P^H £ can  be  bounded  by  (4.1)  in  Theorem  4.1. 


r 
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In  applications  one  may  not  be  interested  in  r;  rather  one  is 
interested  in  the  residual  r of  x with  respect  to  the  matrix  A: 

r * b - Ax  . 

If  we  write 


r - r = (PB'PA)b  - Ex  , 

then 


llr-r||2  < HPB-PA!!2]lbll2  ♦ IIE||2I|*H2  . 


Theorem  5.1  provides  the  necessary  estimate  of  x. 

If  we  concern  ourselves  with  only  the  change  in  Mril 2 we  can  derive 
a slightly  stronger  result.  Since  r is  the  minimizing  residual,  we  have 
llrll 2 £ ||r|| 2*  Likewise  ||b- (A+E)x|l2  £ |!b-  (A+E)xll2>  from  which  it  follows  that 

l|r|| 2 < ||r|| 2 £ llrl|2  ♦ ||E|l2(||x|l2+|lxll2)  . 

Asymptotic  forms  and  derivatives.  An  asymptotic  form  for  the  perturbed 
least  squares  solution  x can  be  obtained  from  (3.4): 


(5.12) 


x - x - A+PaERax  - R^EHPA(AH)tx 

♦ (AHA)+RA^Pjb  ♦ o cn eii  2)  . 


An  equivalent  asynytotic  formula,  which  may  be  useful  in  computational  work, 
can  be  derived  from  the  reduced  form  (3.23).  The  derivative  formula  corre- 
sponsing  to  (5.12)  is 
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dx 

3? 


-A+P  ^ r X - nx  ***  p rA^+v  + fAH.^0  ^Pt, 
APA  JF  RAX  RA  "cPT  PA(A  ) x + (A  A)  ra  -37  PAb  • 


An  inverse  perturbation  theorem.  Theorem  5.1  shows  how  a perturba- 
tion in  A can  affect  the  least  squares  solution.  Here  we  consider  the 

* 

question:  given  a vector  x under  what  conditions  is  x the  least 

squares  solution  of  a slightly  perturbed  problem?  One  such  condition  is 
given  in  the  following  theorem. 

Theorem  5.2.  Let  x € C11  be  given.  Let  x = A+b,  r = b - Ax,  and 
r = b - Ax.  If 

l!r|||  = ||r || 2 * e2  , 

then  there  is  a matrix  E of  rank  unity  with 


(5.13) 


such  that  ||b- (A+E)x||2  is  a minimum. 


Proof.  Let 


e ■ ? - r * (x-x)  ( R(A)  . 


Since  r € R(A)1, 

II f II 2 - llrt! 2 ♦ Hell | , 

which  shows  that  ||ell2"  e.  Let 


Then  E satisfies  (5.13)  and  R(E)  c R(A).  Hence  R(A+E)  <-  R(A).  But 


b - (A+E)x  - r € R(A)a  , 

which  shows  that  the  residual  b - (A+E)x  ( R(A+E)X,  and  x solves  the 
required  least  squares  problem,  a 

A consequence  of  this  theorem  is  that  there  is  little  use  hunting  for 
the  exact  minimizing  x.  Provided  the  residual  is  nearly  minimal,  the 
approximate  solution  x,  however  inaccurate,  is  the  exact  solution  of  a 
slightly  perturbed  problem. 

It  is  sometimes  desirable  that  the  perturbation  matrix  E in  Theorem. 
5.2  not  alter  some  of  the  columns  of  A (e.g.  a column  may  be  dates  in 
years).  This  can  be  done  as  follows.  Let  X be  the  vector  obtained  from 
x by  setting  to  zero  the  components  corresponding  to  the  columns  that  are 
not  to  be  disturbed.  Then 


E « 


ex 


.H 


11*115 

is  the  required  matrix.  Of  course  HXIU  5 llx^  so  that 
||Eli 2 may  still  be  small  enough  for  practical  purposes. 


||E|!  2 ? II Ell  2 ; however 


Notes  and  references.  Much  of  the  perturbation  theory  for  pseudo- 
inverses has  been  a byproduct  of  the  search  for  bounds  for  the  linear  least 
squares  problem.  Golub  and  Wilkinson  (1966)  gave  a first  order  analysis 
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of  the  problem  aiid  were  the  first  to  note  the  dependence  of  the  solution 

2 

on  ic  . Rigorous  upper  bounds  were  derived  by  Hanson  and  Lawson  (1969) , 
Pereyra  (1969),  and  Stewart  (1969).  Wedin  (1969)  also  gives  bounds.  More 
recent  treatments  have  been  given  by  Lawson  and  Hanson  (1974)  and  Abdelmalek 
(1974).  Van  der  Sluis  (1975)  was  the  first  to  point  out  the  mitigating 
effect  of  r)  in  (5.11). 

The  inverse  perturbation  theorem  is  new. 
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